On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence
نویسندگان
چکیده
We provide non-asymptotic bounds for the well-known temporal difference learning algorithm TD(0) with linear function approximators. These include high-probability bounds as well as bounds in expectation. Our analysis suggests that a step-size inversely proportional to the number of iterations cannot guarantee optimal rate of convergence unless we assume (partial) knowledge of the stationary distribution for the Markov chain underlying the policy considered. We also provide bounds for the iterate averaged TD(0) variant, which gets rid of the step-size dependency while exhibiting the optimal rate of convergence. Furthermore, we propose a variant of TD(0) with linear approximators that incorporates a centering sequence, and establish that it exhibits an exponential rate of convergence in expectation. We demonstrate the usefulness of our bounds on two synthetic experimental settings.
منابع مشابه
Finite Sample Analysis for TD(0) with Linear Function Approximation
TD(0) is one of the most commonly used algorithms in reinforcement learning. Despite this, there is no existing finite sample analysis for TD(0) with function approximation, even for the linear case. Our work is the first to provide such a result. Works that managed to obtain concentration bounds for online Temporal Difference (TD) methods analyzed modified versions of them, carefully crafted f...
متن کاملCapacity Bounds and High-SNR Capacity of the Additive Exponential Noise Channel With Additive Exponential Interference
Communication in the presence of a priori known interference at the encoder has gained great interest because of its many practical applications. In this paper, additive exponential noise channel with additive exponential interference (AENC-AEI) known non-causally at the transmitter is introduced as a new variant of such communication scenarios. First, it is shown that the additive Gaussian ch...
متن کاملA new optimal method of fourth-order convergence for solving nonlinear equations
In this paper, we present a fourth order method for computing simple roots of nonlinear equations by using suitable Taylor and weight function approximation. The method is based on Weerakoon-Fernando method [S. Weerakoon, G.I. Fernando, A variant of Newton's method with third-order convergence, Appl. Math. Lett. 17 (2000) 87-93]. The method is optimal, as it needs three evaluations per iterate,...
متن کاملOn the approximation by Chlodowsky type generalization of (p,q)-Bernstein operators
In the present article, we introduce Chlodowsky variant of $(p,q)$-Bernstein operators and compute the moments for these operators which are used in proving our main results. Further, we study some approximation properties of these new operators, which include the rate of convergence using usual modulus of continuity and also the rate of convergence when the function $f$ belongs to the class Li...
متن کاملConvergence analysis of the sinc collocation method for integro-differential equations system
In this paper, a numerical solution for a system of linear Fredholm integro-differential equations by means of the sinc method is considered. This approximation reduces the system of integro-differential equations to an explicit system of algebraic equations. The exponential convergence rate $O(e^{-k sqrt{N}})$ of the method is proved. The analytical results are illustrated with numerical examp...
متن کامل